AITopics | rl generalization

Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

Neural Information Processing SystemsDec-24-2025, 05:18:47 GMT

Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available. We find that standard RL baselines make progress on most tasks but are still far from human performance, suggesting Avalon is challenging enough to advance the quest for generalizable RL.

avalon, procedurally generated world, rl generalization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

Neural Information Processing SystemsOct-11-2024, 02:09:56 GMT

Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available.

avalon, procedurally generated world, rl generalization, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

An Empirical Study on Hyperparameters and their Interdependence for RL Generalization

Song, Xingyou, Du, Yilun, Jackson, Jacob

arXiv.org Artificial IntelligenceJun-2-2019

Recent results in Reinforcement Learning (RL) have shown that agents with limited training environments are susceptible to a large amount of overfitting across many domains. A key challenge for RL generalization is to quantitatively explain the effects of changing parameters on testing performance. Such parameters include architecture, regularization, and RL-dependent variables such as discount factor and action stochasticity. We provide empirical results that show complex and interdependent relationships between hyperparameters and generalization. We further show that several empirical metrics such as gradient cosine similarity and trajectory-dependent metrics serve to provide intuition towards these results.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1906.00431

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology: